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Abstract.  We  present  a  line  search  multigrid  method  based  on  Nash’s  MG/OPT  multilevel 
optimization  approach  for  solving  discretized  versions  of  convex  infinite  dimensional  optimization 
problems.  Global  convergence  is  proved  under  fairly  minimal  requirements  on  the  minimization 
method  used  at  all  grid  levels.  In  particular,  our  convergence  proof  does  not  require  that  these 
minimization,  or  so-called  “smoothing”  steps,  which  we  interpret  in  the  context  of  optimization, 
be  taken  at  each  grid  level  in  contrast  with  multigrid  algorithms  for  PDEs,  which  fail  to  converge 
without  such  steps.  Preliminary  numerical  experiments  show  that  our  method  is  promising. 

Key  words,  convex  optimization,  multigrid  method,  line  search,  global  convergence 

AMS  subject  classifications.  65K05,  65N55,  90C06,  90C25 


1.  Introduction.  Infinite  dimensional  optimization  problems  are  a  major  source 
of  large-scale  finite  dimensional  optimization  problems  [13,  27].  Since  it  is  not  pos¬ 
sible  or  very  hard  to  obtain  explicit  solutions  for  these  problems,  they  are  usually 
solved  numerically  either  by  a  adiscretize-then-optimize”  strategy  or  an  “optimize- 
then-discretize”  strategy.  In  this  paper,  we  follow  the  first  strategy  and  consider  a 
class  of  problems  whose  discretized  versions  have  the  form: 

(1-1)  min  fh{xh) 

where  h  is  an  index  used  to  specify  the  resolution  or  discretization  of  the  optimization 
problem,  Xh  is  a  vector  of  dimension  rih  and  fh  is  a  real  valued  and  twice  continuously 
differentiable  convex  function  on  a  domain  tth  G  Rn'1 . 

Multigrid  methods  [9,  11,  12,  19,  25,  34,  35,  36]  are  iterative  methods  that  were 
originally  proposed  for  linear  elliptic  partial  differential  equations  (PDEs).  In  this 
approach,  coarser  grid  corrections  are  recursively  imbedded  in  an  iterative  process, 
in  combination  with  so  called  “relaxation”  or  “smoothing”  steps,  to  accelerate  the 
convergence  on  the  target  grid.  Several  extensions  for  nonlinear  PDEs  have  been  well 
studied.  One  is  the  global  linearization  method  [20,  34],  which  uses  the  multigrid 
method  within  Newton’s  method  for  nonlinear  equations  to  solve  the  system  of  linear 
equations  that  provides  the  Newton  step  at  each  iteration.  The  second  is  the  local 
linearization  method,  such  as  the  full  approximation  scheme  (FAS)  [10]  and  the  closely 
related  nonlinear  multigrid  method  (NMGM)  [19],  in  which  the  multigrid  methodology 
is  directly  applied  to  the  original  system  of  nonlinear  equations  and  its  corresponding 
system  of  nonlinear  residual  equations.  A  combination  of  global  and  linearization  is 
studied  in  [37]  and  a  projection  multilevel  method  is  proposed  for  quasilinear  elliptic 
PDEs  in  [23,  24,  26],  where  the  system  of  nonlinear  equations  is  reformulated  as  a 
least-squares  problem. 
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Multigrid  methods  for  infinite  dimensional  optimization  problems  have  also  re¬ 
ceived  considerable  attention  [1,  3,  4,  6,  7,  14,  33].  However,  until  recently  the  essential 
thrust  of  these  methods  was  based  on  employing  multigrid  methods  for  solving  the 
nonlinear  equations  derived  from  the  optimality  condition  of  problem  (1.1).  In  a  new 
approach,  Nash  [21]  (see  also  [22,  29])  proposed  a  multigrid  optimization  framework 
for  solving  problem  (1.1),  where  fh(xh)  is  a  convex  function  of  Xh-  A  proof  of  the 
global  convergence  of  Nash’s  method  was  given  in  [5].  This  proof  requires  that  at 
least  one  iteration  of  the  optimization  algorithm  that  is  used  at  each  level  be  per¬ 
formed  either  before  going  to  a  coarser  level  or  after  returning  from  a  coarser  level 
during  a  multigrid  cycle.  These  iterations  of  the  optimization  algorithm  are  similar  to 
prior  smoothing  or  post  smoothing  steps  in  multigrid  methods  for  PDEs.  Expanding 
on  Nash’s  approach,  Gratton,  Sartenaer  and  Toint  [18,  16,  17]  proposed  a  recursive 
trust  region  method  that  converges  to  a  first-order  optimal  point  without  doing  such 
smoothing  steps  at  each  multigrid  cycle. 

In  this  paper,  we  propose  a  line  search  multigrid  optimization  method  that  adopts 
some  of  the  features  of  both  Nash’s  method  and  the  method  of  Gratton,  Sartenaer 
and  Toint.  We  show  that  the  search  direction  generated  by  our  multigrid  approach 
is  always  a  descent  direction  when  the  objective  function  fh{xk)  is  convex.  We  inter¬ 
pret  smoothing  steps  as  steps  in  an  optimization  algorithm,  and  prove  that  our  line 
search  method  does  not  require  such  steps  at  each  multigrid  cycle  to  guarantee  global 
convergence  in  the  convex  case.  We  aslo  prove  that  the  convergence  rate  is  at  least 
R-linear.  Smoothing  steps  can  be  Newton  steps  and  we  show  that  convergence  is  still 
guaranteed  without  solving  the  Newton  systems  exactly.  If  each  Newton  system  is 
solved  by  the  linear  multigrid  method,  our  algorithm  can  be  viewed  as  a  combination 
of  the  global  linearization  method  and  the  FAS  scheme  for  nonlinear  PDEs. 

This  paper  is  organized  as  follows.  In  section  2,  we  briefly  review  multigrid  meth¬ 
ods  for  PDEs.  In  section  3,  a  multigrid  method  for  solving  unconstrained  convex 
problems  is  developed.  A  proof  of  global  convergence  as  well  as  a  proof  of  R-linear 
convergence  for  uniformly  convex  problems  are  presented  in  section  4.1.  Global  con¬ 
vergence  for  general  convex  function  is  proved  in  section  4.2.  In  section  5,  we  dis¬ 
cuss  some  techniques  to  enhance  our  mulitigrid  method,  including  the  full  multigrid 
method,  different  ways  to  generate  search  directions,  and  other  ways  to  do  smoothing 
steps.  Finally  preliminary  numerical  results  are  given  in  Section  6. 

We  adopt  the  following  notation  in  this  paper:  fhtk  =  fh{xh,k),  ^  fh,k  =  Vfh(xh>k). 
Here  Xh,k  is  a  vector  where  the  first  subscript  ft  denotes  the  discretization  level  of  the 
multigrid  and  the  second  subscript  k  denotes  the  iteration  count.  If  a  vector  has  only 
one  subscript,  as  for  example  Xh,  the  subscript  h  either  refers  to  the  level  of  the  multi¬ 
grid,  and  thus  xk  itself  is  a  vector  or  it  refers  the  fact  that  Xh  is  the  ftth  component 
of  the  vector  x.  When  it  is  not  clear  from  the  context,  we  will  point  out  the  specific 
meaning.  We  use  letter  H  to  denote  the  next  coarsest  level  ft  —  1  from  level  ft.  N  is 
reserved  for  the  index  of  the  finest  level  and  No  for  the  coarsest  level. 

2.  Multigrid  Methods  for  PDEs.  Consider  solving  the  system  of  linear  equa¬ 
tions 

(2-1)  AhXh  =  bhj 

where  At l  is  a  symmetric  positive  definite  matrix  and  ft  is  the  discretization  level.  Let 
B} ,  be  an  approximate  inverse  of  Ak.  Define  Rh  to  be  the  restriction  operator  from 
level  ft  to  level  H  and  Ph  be  the  prolongation  operator  from  level  H  to  level  ft.  As  in 
standard  multigrid  methods,  we  assume  that: 
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Assumption  2.1.  The  prolongation  operator  Ph  and  the  restriction  operator  Rh 
satisfy: 

(2.2)  ahPh  =  Rl. 

For  simplicity,  we  take  07,  =  1,  which  does  not  affect  our  convergence  analysis. 

Assumption  2.2.  The  coarser  level  matrix  Ah_i  relates  to  the  finer  level  matrix 
Ah  through  Ah_  1  =  Rh  AhPh. 

Given  an  approximate  solution  Xhtk,  a  multigrid  cycle  [19,  35]  for  solving  problem 
2.1  can  be  stated  as  follows: 

Algorithm  1.  Multigrid-cycle:  Xh,k+ 1  =  MGCYCLE(h,  Ah,bh,Xh,k) 
-Pre-smoothing:  Compute  xhj,  =  xh,k  +  Bh(bh  -  Ahxh>k)- 
-Coarse  Grid  Correction: 

Compute  the  residual  fh,k  =  bh  —  AhXh,k- 
Restrict  the  residual  fh-itk  =  RMh,k- 

-Solve  the  Coarse  Grid  Residual  Equation  Ah-ieh-i,k  =  Th-\,k 
If  h  -  1  =  N0,  solve  eh-i,k  =  Aff^h- 1>k, 

Else  call  e/1—1  —  MGCA  C LEifi  —  1,  Ah—i,  rh—i,ki  0)* 

Interpolate  the  correction:  eh,k  —  Ph,eh-i,k- 

Compute  the  new  approximation  solution:  Xh,k+ 1  =  Xh,k  +  e-h,k- 

As  a  result,  we  have  the  following  iterative  algorithm: 

Algorithm  2.  Multigrid  Algorithm  MG  {Ah,  bh,e) 

Initialization:  Let  Xh,o  and  e  be  given. 

For  k  =  0,1,2,  ••  •  UNTIL  \\Ahxh}k  -  bh\\  <  e  DO 
_ Call  xh,k+ 1  =  MGCYCLE(h,Ah,bh,xhtk)- _ 

Since  understanding  the  two-grid  version  of  Algorithm  2  is  sufficient  for  under¬ 
standing  the  general  algorithm,  we  only  consider  the  two-grid  algorithm  here.  From 
Algorithm  1,  with  h  —  1  =  H,  we  have 

Xh,k+1  =  %h,k  A  Ph&H,k  =  &h,k  A  BhAfl  RhXh,ki 

where  fh,k  =  bh  —  A^Xh^k  is  the  residual  at  the  point  Xh,k-  Note  that  the  step  eu,k 
involves  A~f^ .  This  is  true  for  the  two-grid  version,  where  level  H  is  the  coarsest  level. 
Assume  that  x*f  is  the  solution  of  (2.1).  Then  at  Xh.,k,  the  error  is  Xh,k~ =  S\{xh,k  — 
x*h)  and  the  residual  is  fh,k  =  S2rh,k,  where  rh,k  =bh-  Ahxh,k ,  Si  =  Ih  -  BhAh  and 
S2  =  Ih  —  AhBh-  Hence,  the  error  at  the  new  point  Xh,k+ 1  is: 

Xh.k+1  T h  =  (I  BhAfj  RhAh)Si{Xh.k  %h)i 


and  the  residual  at  Xh,k+i  is 

Xh.k+l  —  bh.  AhX}i,k+\  =  (1  AhPhAfj  Rh)S2Vh,k' 

Therefore,  the  two-grid  multigrid  algorithm  converges  uniformly  if  the  spectral  radius 
p((I  —  AhPhAfj1  Rh)S2)  <  1.  Note  that  the  coarse  grid  correction  alone  will  not  lead 
to  convergence,  since  usually  p(I  —  AhPhAf^Rh)  >  1. 

The  smoothing  steps  of  the  two-grid  multigrid  algorithm  smooth  the  residual 
on  the  fine  level  h  and  the  coarse  grid  correction  steps  damp  the  error  on  the  coarse 
level  H.  Different  choices  of  Bh  result  in  different  iterative  methods.  Specifically,  if 
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we  decompose  Ah  as  Ah  =  Dh  —  Lh  —  Uh,  where  Dh  is  the  diagonal  of  Ah  and  —Lh 
and  —Uh.  are  the  lower  and  upper  triangular  parts  of  Ah,  respectively,  then  common 
choices  for  Bh  are: 

D^1,  Jacobi ; 

wDj1,  Damped  Jacobi  (0  <  w  <  2 /  p(D^1  Ah); 

(. Dh  —  Lh )  ,  Gauss  —  Seidel', 

u(Dh-Lh)-1,  SOR  (0  <  w  <  2). 

Let  us  now  study  the  idea  of  multigrid  from  the  point  view  of  optimization. 
Solving  the  system  of  linear  equations  (2.1)  is  equivalent  to  solving  the  strictly  convex 
quadratic  minimization  problem: 

(2.3)  min  fh{xh)  =  \xlAhxh  -  bjxh ■ 

Xh  l 

The  reduction  in  the  value  of  the  objective  function  obtained  by  moving  from  Xh,k  to 
3'h,k-\- 1  Is 

(2.4)  fh(xh,k )  -  fh{xh,k+ i)  =  [fh{xh,k)  ~  fh{xh,k )]  +  [fh{xh,k)  -  fh{xh,k+ i)], 
where  Xh,k,  the  outcome  of  the  presmoothing  step  is 

Xh,k  —  Xh,k  A  Ph,k ,  Ph,k  —  -RftV  f h,k, 

and  V  fhtk  is  the  gradient  of  fh  at  Xh,k  ■  The  reduction  of  the  objective  function  value 
between  Xh,k  and  Xh,k  is 

(2.5)  fh(xh,k )  -  fh{xh,k )  =  ~{^Ph,kAhPh,k  +Ph,k^fh,k) 

=  -\{Vfh,k)TBlAhBhS7fh,k  +  (V/,,,)T4TV/h,t 
=  (Vh^B^B-1  -  1 Ah)BhVfh,k • 

The  reduction  of  the  objective  function  values  between  Xh,k  and  Xh,k+ l  is 

(2.6)  fh{Xh,k )  ,fh(Xh,k+ 1)  =  —  (^  eh,k^h^h,k  +  eh,k^  .fh,k) 

=  ~^(V fh,k)T PhA~Hl  RhAhPhA^RhVfht  +  (V fh,k)T Ph.A'Jj1  Rh'V fh,k 

Ah 

=  l(^fh,k)TSjPhAH1RhS2Vfh,k 

since  V fh,k  =  V f{xh,k)  =  S2V fh,k-  Combining  (2.4),  (2.5)  and  (2.6),  we  have  that 

(2.7)  fh{xh,k)  -  fh(xh,k+ 1)  =  l^/fh,k)TSjPhA^1RhS2\/fh,k 

+(V/h,t)TBhT(B-1  -  l-Ah)BhS7fhM. 

If  Bh  is  the  Gauss-Seidel  operator,  we  have: 

1  -  7)  Ah  =  Dh  —  Lh  —  -  Ah  =  -Dh  -  -  (I/ft,  -  Uh), 
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where  uT (Lh  —  Uh)u  =  0  for  u  €  R"'*,  as  L h  —  Uk  is  antisymmetric.  Hence, 

-  \Ah)BhVfh,k 

=  \{VfKk)TBiDhBhVfKk  >  ^\min(BjDhBh)\\X7fh,k\\2  >  0. 

Therefore  the  reduction  of  the  objective  function  value  on  each  multigrid  cycle  is  at 
least 

(2-8)  h(xKk)  -  fh{xh,k+1)  >  ^||V/M||2, 

where  the  constant  flh  =  ^Xmin(B^ DhBh)  >  0  follows  from  the  positive  definiteness 
of  Ah-  Summing  (2.8)  over  k  from  0  to  j,  we  have: 

3 

fh(xh,o)  -  fh(xh,j+ 1)  >  ^2  /3h\\V fh,k\\2 ■ 

k—0 

Then  taking  the  limit  as  j  goes  to  +oo,  we  obtain 

lirn  \7 fh,k  =  0, 

fc— »  +  oo 

as  fh{xh)  is  bounded  below. 

Consider  now  the  two-grid  version  of  Algorithm  2  without  pre-smootliing  steps. 
Suppose  we  start  from  a  point  xk,k  that  satisfies: 

(2-9)  \\RhVfh,k\\  >  k||V/m||, 

where  n  is  a  constant.  Then  the  gradient  at  Xh,k+ 1  is 

V/M+i  =  (/  —  AhPhAjj1  Rh)^  fh,k- 

Now  using  the  fact  that  Ah  =  RhA^Ph , 

RhB  fh,k+l  =  RhB  fh,k  ~  RhAhPhA^j1  RhB  fh,k  =  0, 

which  means  that  the  next  coarse  grid  correction  en,k+ 1  =  0.  Therefore,  the  two  grid 
algorithm  can  no  longer  make  any  progress  at  the  point  Xh,k+i  by  taking  steps  in  the 
coarse  grid. 

If  we  do  a  line  search  along  the  direction  ek,k  =  ~PhA~^RhBfh,k  to  find  the  best 
point  along  that  direction,  i.e. ,  we  solve 

min  fh(xh,k  +  aehtk), 

a<ER 

we  obtain 

=  (Vfh,k)TPhA-HlRhVfhtk  =  1 

eh,kAheh,k  (V fh.,k)T PhAj/  RhAhPhA^1  RhX7 fh  k 

Ah 

Hence,  using  a  step  length  of  one  along  eh,k  is  optimal1.  Moreover,  we  have  from  (2.6) 
and  (2.9)  that 

fh{Xh,k )  —  fh{Xh,k+l)  =  2  (y  ,fh,k)T PhAtf1  RhV  fh,k  >  Ph,k  ||  V/ft ,fc || 2, 

^or  the  use  of  steplength  optimization  in  linear  multigrid  methods  see  [31]. 


6 


ZAIWEN  WEN  AND  DONALD  GOLDFARB 


where  / 3h,k  =  |K2/Amax(^4//).  Although  matrix  PhAj^Rh  is  not  of  full  rank,  the 
reduction  of  the  objective  function  value  is  still  bounded  below  by  the  square  of  the 
norm  of  the  gradient  multiplied  by  the  positive  constant  (3h,k-  Hence,  the  recursive 
step  eh,k  is  a  good  step. 

The  two-grid  algorithm  can  avoid  a  break  down  if  it  does  pre-smoothing  steps 
until  a  point  Xh ,fc  is  generated  that  satisfies 

(2.10)  \\RhVfh,k\\  >  wllV/h.fcH,  and  fh{xhtk)  <  fh{xh,k)- 

Then  a  recursive  step  can  be  taken  and  the  sequence  {xh,k}  will  converge  to  the 
optimal  solution.  The  two-grid  algorithm  with  traditional  smoothing  steps  guarantees 

(2.10)  and  the  smoothing  steps  also  ensure  that  the  norm  of  the  gradient  decreases, 
i.e.,  ||V/M+i||  <  r'/iH  V/h,fc||  for  some  constant  0  <  uh  <  1. 

Generating  the  coarse  grid  correction  step  eu,k  can  also  be  viewed  as  solving  a 
coarser  level  minimization  problem  that  is  closely  related  to  the  finer  level  problem 
(2.3).  From  the  coarse  grid  residual  equation,  we  have 

Ane.H,k  =  x  H,k  =  —RyS7fh,k 

AH(xH,k  +  eH,k)  —  bn  =  AHxH,k  ~  bn  —  RfN fh,k 

AH{xH,k  +  en,k )  =  bn  +  (V/h^  -  RhS7  fh,k)- 

(The  above  construction  for  the  coarse  grid  residual  equation  is  also  what  is  done 
in  the  FAS  [19,  34]).  Hence  en,k  is  identical  to  en,k  =  x*H  —  xjj,k ,  where  x*H  is  a 
minimizer  of  the  problem 

(2.11)  min  {4>h{xh)  =  fH{xH)  -  ( vH)TxH } 

XH 

and  vh  —  V  fn,k  ~  RhS?  fh,k-  This  interpretation  provides  a  motivation  for  extending 
the  multigrid  Algorithm  2  to  an  algorithm  for  minimizing  a  general  convex  function. 

3.  A  Multigrid  Method  for  Unconstrained  Convex  Optimization.  In 

this  section,  we  develop  a  multigrid  method  for  the  uppermost  finest  level  problem 

(3.1)  min/N(xN). 

Without  loss  of  generality,  we  shall  explain  the  basic  idea  underlying  this  method 
starting  from  the  fcth  iteration  Xh,k  at  level  h.  Whenever  possible,  we  will  compute 
a  search  direction  dh,k  by  resorting  to  problems  on  coarser  levels  recursively.  If  the 
current  level  is  the  coarsest  level  or  the  coarser  level  model  is  not  a  good  choice,  a 
direction  dh.,k  will  be  computed  directly  on  the  current  level  h. 

If  a  “ recursive  search ”  direction  is  chosen,  we  first  move  to  the  next  coarsest 
level  H  with  an  initial  point  Xh,q  =  RhXh,k ■  Next  we  compute  the  minimizer  (or 
approximate  minimizer)  Xh,i *  of  the  coarse  level  problem 

min  iPh(xh), 

where  ipn  is  an  approximation  of  the  original  problem  (1.1)  on  the  coarse  level  H. 
The  function  i/jh,  which  we  will  define  below,  depends  on  the  point  x h,k  and  the  level 
h  and  will  be  different  for  different  points.  To  simplify  our  notation,  we  will  omit  this 
dependence  when  referring  to  iPh(')  or  its  derivatives  as  this  should  not  cause  any 
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confusion.  Then  we  project  the  direction  d*H  =  Xh.-i*  —  xh,o  on  level  H  back  to  level 
h  to  obtain  the  recursive  search  direction 

/**-! 

(3.2)  dh,k  =  Phd*H  =  Ph  I  ^  ctH,idH,i 

V  i= 0 

where  an.i  and  dn.t  are  the  step  size  and  search  direction,  respectively,  for  the  ith 
iteration  on  level  H.  Here  each  search  direction  dn,i  from  xr.%  to  Xn,i+i  for  i  = 
0,  •  •  •  ,  **  —  1  is  also  computed  recursively  whenever  possible. 

If  a  “ direct  search"  direction  is  chosen,  dh,k  is  computed  directly  on  level  h.  Many 
possibilities  exist  for  how  to  compute  such  a  direction.  To  illustrate  our  algorithm, 
we  solve  the  Newton  system: 

(3-3)  Gh:kdh,k  =  9h,k 

exactly  or  inexactly  to  obtain  where  we  have  used  the  notation  g^.k  =  ^fph,k  = 
Viph(%h,k)  and  Gh,k  =  V2tph,k  =  V2ifh(xh,k)-  As  stated  above,  we  must  use  this 
direct  search  direction  when  the  coarse  level  model  is  not  appropriate.  Specifically, 
we  restrict  the  use  of  the  recursive  search  direction  at  the  point  Xh,k  to  the  case  where 

(3-4)  \\Rh9h,k\\  >  \\Rh9h,k\\  >  eh- 

The  reason  for  this  is  that  Rh9h,k  may  be  zero  even  though  gh,k  is  not  zero  if  gh.k 
lies  in  the  null  space  of  Rf l;  hence  the  current  iterate  appears  to  be  a  stationary  point 
for  if>H  whereas  it  is  not  for  i/v  These  conditions  are  the  same  as  those  used  in  the 
multigrid  algorithm  proposed  in  [18,  16,  17]. 

Let  us  now  define  the  coarse  level  approximation  ipu  explicitly.  To  ensure  con¬ 
vergence  and  efficiency,  the  coarse  level  problem  is  not  simply  the  discretized  problem 
(1.1)  for  the  coarse  level  H,  but  rather: 

(3.5)  min  {4>h(xh)  =  }h{xh)  -  ( vH)TxH }, 

XH 

where  vr  =  V/ff, o  —  Rh,gh,k-  Furthermore,  if  we  define  cn  =  0,  then  model  (3.5)  can 
be  naturally  extended  to  all  levels  and  the  uppermost  level  model  problem  is  exactly 
problem  (3.1).  Moreover,  (3.5)  enforces  a  certain  coherence  between  the  fine  level 
problem  ^  and  the  corresponding  coarse  level  problem  ipu. 

Lemma  3.1.  If  we  choose  the  recursive  scheme  to  generate  the  direction  dh,k  = 
Ph.d*H,  where  the  minimization  on  the  coarse  level  H  starts  from  the  initial  point 
xr, o  =  RhXh,k,  then  the  problems  of  the  two  consecutive  levels  h  and  H  are  first- 
order  coherent  in  the  sense  that 

(3.6)  gH,  o  =  Rh9h,k ,  (dh,k)T  9h,k  =  (d*H)T  gn,o- 


Proof.  The  first  part  of  (3.6)  comes  from  the  fact  that 

9h, o  =  V/if, o  —  vr  =  V///, o  ~  V/hj0  +  Rhgh.k  =  Rh,gh,k- 
This  together  with  (3.2)  and  (2.2)  and  our  assumption  that  ay  =  1,  implies  that 
(dh,k)T gh,k  =  (Phd*H)Tgh,k  =  (d*H)T  Rhgh.k  =  (d*H)r  gn,  o- 
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□ 

The  following  lemma  shows,  the  recursive  search  direction  dh,k  is  a  descent  direc¬ 
tion  for  ip h  at  Xh,k  if  /h{xh )  is  a  convex  function. 

Lemma  3.2.  Suppose  f h{xh )  is  a  convex  function.  If  we  choose  the  recursive 
scheme  to  generate  the  direction  dh,k  =  Phd*H,  where  the  minimization  on  the  coarse 
level  H  starts  from  the  initial  point  xh,o  =  RhXh.k  and  stops  at  the  point  xjj,i *  with 
'f>H{xH,i*)  <  iPh(xh,o),  then  dh,k  is  a  descent  direction;  that  is  (dh,k)T Qh,k  <  0. 
Moreover,  the  directional  derivative  (dh,k)T 9h,k  satisfies 

(3.7)  -{dh,k)T 9h,k  >  *Ph,o  —  ip H,i *• 

Proof.  Since  fn(xH)  is  convex,  so  is  iPh(xh)’,  hence 

(3.8)  IpH {XH,i*  )  >  iPh{Xh, o)  +  (■ XH,i *  —  Xh,o)T 9H,0- 

Hence  we  conclude  that  inequality  (3.7)  holds,  and  from  the  fact  that  ipH{xH,i*)  < 
iPh(xh,o),  it  follows  that  (d*H)T gnp  <  0.  □ 

In  our  algorithm,  we  chose  a  step  size  oth,k  along  the  direction  dh,k  that  satisfies 
the  Armijo- Wolfe  conditions 

(3.9a)  iph{xh,k  +  oth,kdh,k)  <  iph,k  +  Piah,k(gh,k)T dh,k, 

(3-9b)  (W 'iph(.Xh,k  T  .kdh.k'))  d},  },  P  p2 (gh,k)  dh  ki 

where  0  <  p\  <  pi  <  1  are  two  controlling  parameters.  The  smaller  p2  is,  the  stricter 
the  line  search  is.  To  select  a  step  size  a.h,k  to  satisfy  (3.9a)  and  (3.9b),  we  refer  the 
reader  to  Algorithms  3.2  and  3.3  in  [30],  which  are  based  on  interpolation  and/or 
bisection.  For  a  more  detailed  description  of  these  kind  of  strategies,  see,  for  example 

[2S]- 

Our  multigrid  algorithm  stops  when  the  norm  of  the  gradient  is  smaller  than  a 
given  tolerance  ,  i.e.  ||g/i,fc||  <  e/,.  It  also  limits  the  number  of  iterations  to  at  most  K 
at  all  levels  h  other  than  the  finest  level  h  =  N. 

Algorithm  3.  xh  =  MLS(h,Xh,o,9h,o) 

Step  1.  If  h  <  N,  compute  vh  =  V/M  -  gh, o,  set  gh>0  =  gh< 0; 

Else  set  Vh  =  0  and  compute  gh, o  =  ^ fh, o- 
Step  2.  For  k  =  0, 1, 2,  •  •  • 

2.1.  If  ||5/t,fc||  <  eh  or  if  h  <  N  and  k  >  K, 

Return  solution  Xh,k\ 

2.2.  If  h  >  N0  and  \\Rhgh,k\\  >  K\\gh,k\\  and  \\Rh.gh,k\\  >  eh 

- Recursive  Search  Direction  Computation 

Call  Xh-i,i*  =  MLS(h  —  1,  RhXh,k,  Rhgh,k)  to  return  a  solution  (or  ap¬ 
proximate  solution  )  Xh-i,i *  of  “minXh  l  iph-i{xh-i)"  ■ 

Compute  dh,k  =  Ph(xh-i,i*  -  RhXh,k)  =  Phd*h_1. 

Else 

-Direct  Search  Direction  Computation 

Solve  Gh,kdh,k  =  — 9h,k  exactly  or  inexactly  to  obtain  dh,k- 

2.3.  Call  line  search  to  obtain  a  step  size  ah,k  that  satisfies  the  Armijo-Wolfe 
conditions  (3.9a)  and  (3.9b). 

2.4.  Set  Xh,k+ 1  =  Xh,k  T  OZh,kdh,k ■ 


A  LINE  SEARCH  MULTIGRID  METHOD 


9 


Remark  3.3.  Algorithm  3  automatically  chooses  between  the  direct  search  direc¬ 
tion  and  the  recursive  direction  based  on  condition  (3.4).  Therefore  it  can  be  viewed 
as  a  combination  of  the  global  linearization  method  and  the  FAS  scheme  when  applied 
to  nonlinear  PDEs. 

One  element  in  Algorithm  3  that  we  have  not  fully  specified  is  how  to  solve  the 
Newton  system  (3.3)  in  the  direct  search  direction  computation.  The  most  straight¬ 
forward  way  is  to  solve  (3.3)  exactly  using  factorization  methods  [15].  However,  doing 
this  is  very  expensive  on  the  finer  levels.  A  very  natural  adaptive  strategy  is  the 
following.  Whenever  we  are  on  levels  where  the  total  number  of  variables  is  not  too 
large  and  the  corresponding  Hessian  is  sparse,  compute  the  Hessian  and  its  Cholesky 
factorization  directly,  i.e. ,  (3.3)  is  solved  exactly;  for  all  other  cases,  we  only  solve 

(3.3)  to  a  certain  accuracy  by  using  a  (preconditioned)  conjugate  gradient  method 
[15]  or  the  multigrid  Algorithm  2. 

4.  Convergence  Analysis.  Throughout  this  section,  we  define 

d&  f 

(4.1)  vo  =  maxjl,  max  ||Pj||}  =  max{l,  max  ||i?j||}  <  oo, 

,N  i= l,---  ,N 

and  adopt  some  concepts  and  notation  from  [18,  16,  17]. 

1.  We  shall  refer  to  the  fcth  iteration  on  level  h  as  iteration  (ft,  k).  We  define  the 
iteration  ( h ,  k)  as  the  predecessor  of  a  minimization  sequence  that  consists 
of  all  successive  iterations  on  level  h  —  1  until  a  return  is  made  to  level  h. 
If  iteration  ( h  —  1,Z)  is  in  this  minimization  sequence,  we  use  the  notation 
(ft,  k)  =  ir(h  —  1,  l)  to  indicate  this. 

2.  For  iteration  (ft,  ft),  we  define  the  set 

dsf 

(4.2)  1Z(h,k)  =  {(j,l)  \  iteration  (j,  /)  occurs  within  iteration  (ft,  k)} 
and  the  deepest  level  in  lZ(h,  k)  by 

(4.3)  p(h,  k)  d=  min  j 

3.  We  denote  by  T(ft,  k)  the  subset  of  iterations  (j,  l)  G  TZ(h,  k)  in  which  djg  is 
a  direct  search  direction,  i.e., 

(4.4)  T(ft,  k)  =  {( j,l )  €  lZ(h,k)  \  djg  is  a  direct  search  direction  }. 

4.1.  Uniformly  Convex  Problems.  In  this  subsection,  we  assume 
Assumption  4.1.  fh(x)  is  twice  continuously  differentiable  and  uniformly  con¬ 
vex;  that  is,  there  exist  constants  0  <  to/j  <  Mh  <  oo  such  that 

(4.5)  mh\\d\\l<dTV2fh{x)d<Mh\\d\\l,  VdGH"*, 

for  all  x  G  {x  |  fh{xh)  <  fh{xh, o)}-  Moreover,  let  m  =  minft{m^},  M  =  ma xh{Mh}. 

Since  iph(x)  differs  from  fh(x)  only  by  a  linear  term,  Assumption  4.1  also  holds 
for  'ifhfx).  In  the  following,  we  state  some  useful  properties  of  convex  functions. 

Lemma  4.2.  ([32]:  Lemma  5.3.4)  Suppose  fh{x),  and  hence  il>h(x),  satisfy  As¬ 
sumption  4-1. 
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1.  If^h(y)  <  iph(x),  then 

(4.6) 

2.  For  all  x, 

(4.7) 


||V^(*)||>-||y-*||. 


^-\\x  -  rr* || 2  <  iph(x)  -  i>h(x*)  <  —  \\Viph(x) 
2  m 


where  x*  is  the  unique  minimizer  of  iph  (x)  ■ 

Lemma  4.3.  ([32]:  Theorem  2.5.8)  Suppose  iph(x)  satisfies  Assumption  4.1.  If 
a  is  a  step  size  that  satisfies  the  Armijo  condition  (3.9a)  along  a  descent  direction  d, 
then  the  decrease  ofifhix)  satisfies  iph(x)—iph(x+ad)  >  cilladll2  withci  =  — p)m 

l+V  M  /  m 

We  will  also  make  use  of  the  following  inequality. 

Lemma  4.4.  Letdi,d2,---  ,c4-  be  vectors  in  1" .  Then  H4?'||2  —  ill  Ylj=i  4/'l|2 

Proof.  We  prove  this  lemma  by  induction  on  k.  The  result  is  trivial  if  k  =  1. 
Suppose  the  inequality  is  true  for  k  —  1 ;  we  now  prove  that  it  is  also  true  for  k. 


£  nil2 -)ii£<y 

3= 1  3= 1 


k- 1 


/ 


k- 1 


> 


^ll^.f  +  RII2-^  I  II  E^ll2 +  11^11 

3= 1  ‘  " 


V 


3=1 


_  1 
"  k 

>  0, 


/ 


k- 1 


'  k- 1 


-TllE^H2  +  (fc-1)ll^l|2-2 


3=1 


^3  =  1 


where  the  last  inequality  comes  from  the  Cauchy-Schwartz  inequality  and  the  fact 
that  -a2  +  eh2  >  2 ab  for  arbitrary  scalars  a  and  b  and  e  >  0.  This  proves  the  lemma. 

□ 

Based  on  the  curvature  condition  (3.9b)  and  the  uniform  convexity  of  ifh,  we  have 


aj}iM\\djti\\2  >  (djti)T[\/ijjh{xjti  +ajtidjti)  -  Viph(xjti)] 
>  ~{l  -  pi){gj,i)T djti 


for  any  iteration  ( j ,  l)  G  7 Z(h,  k).  Hence,  the  step  size  ah[  is  bounded  below  by: 


(4.8) 


<Xj,l  >  (1  -  P2 ) 


\(gj,i)Tdj,i\ 

M\\dJtl\\*  ' 


We  will  now  show  that  for  certain  search  directions  both  ay;  and  cos where  6(y; 
is  the  angle  between  djj  and  the  steepest  descent  direction  — ,  are  bounded  away 
from  zero.  Therefore,  for  such  choices,  the  minimization  sequence  generated  by  Algo¬ 
rithm  3  on  the  uppermost  finest  level  is  globally  convergent  whereas  the  minimization 
sequences  on  all  other  coarser  levels  are  either  globally  convergent  or  stop  after  at 
most  K  steps. 
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Let  us  first  consider  the  direct  search  direction.  Specifically,  we  show  that  these 
particular  choices  for  this  direction  satisfy: 

Condition  4.5.  If  iteration  (j,l)  £  7~(h,k),  the  step  direction  djj  and  the  step 
size  ajj  satisfy 

(4.9)  ajti  >  ar,  ||dj,;||  <  Pr\\9j,i\\  and  ~  ( dj,i)T  9j,i  >  Vr\\gj,i\\2, 

where  ar,  dr  and  rjr  are  positive  constants. 

Note  that  the  last  two  inequalities  in  (4.9)  imply  that  cos >  rjr/Pr-  The 
steepest  descent  search  direction  djj  =  —ghi  obviously  satisfies  Condition  4.5.  The 
following  lemma  shows  that  the  exact  Newton  step  satisfies  Condition  4.5. 

Lemma  4.6.  If  iteration  (j,l)  £  T(h,k)  and  the  step  direction  djy  satisfies 
Gjjdjj  =  —gj,i  exactly ,  then  Condition  f.5  is  satisfied  with  parameters 

mf  1  1 

(4-10)  ar  =  (1  -  Pr  =  —  and  rjT  =  vy, 

M z  to  M 

Proof.  Since  the  step  djj  satisfies  Gjjdjj  =  ~9j,i  exactly  and  Assumption  4.1 
holds,  we  have 


and 


~(dj,i)Tgj,i  =  \  {gj.i)TGj^ghl  |  >  M 


_1H  9j,l 


IMilll2  —  (dj.i)  Gj j  gjti  <  to  \\gj,l H2; 

which  together  with  (4.8)  yields  (4.9)  with  parameters  (4.10).  □ 

Now  consider  the  inexact  Newton  step  generated  by  the  conjugate  gradient  method 
(CG)  method.  Given  initial  values  so  =  0,  ?’o  =  g  and  po  =  —  ro,  the  CG  method  for 
solving  the  system  linear  equations  Gd  =  —g  generates 

si+i  =  Si  +  a. ipi,  ri+ 1  =  r»  +  aiGpi,  pi+1  =  -r»+ 1  +  A+i Pi,  for  *  =  0, 1,  ■  ■  ■  , 
where  a,-  =  r4ff'  and  =  r»+in+1 .  The  solution  is  set  to  d  =  Sj  when  a  certain 

p?  Gpi  1  ~  r>  Vi 

accuracy  is  achieved.  The  CG  method  is  invariant  under  an  orthogonal  transformation 
[38].  Define  g  =  QT g  and  G  =  QTGQ ,  where  Q  is  any  given  orthogonal  matrix.  Let 
(si,  Ti,pi)  and  (Si,  fi,pi)  be  iterates  generated  by  the  CG  method  applied  to  Gd  =  —g 
and  Gd  =  —g,  respectively.  Then,  it  can  be  shown  that  s,;  =  QT Si,  ft  =  QTVi 
and  pi  =  QTPi-  In  particular,  for  any  given  g  £  and  any  symmetric  matrix  G, 


there  exists  an  orthogonal  matrix  Q  such  that  QJ 

g  is  parallel  to  the  first  coordinate 

direction  and  QT  GQ  is  a  tridiagonal  matrix.  Specifically,  we 

have 

fui 

Vl 

0  •• 

0 

0  ^ 

Vi 

U2 

v2  •  • 

0 

0 

(4.11)  g  =  \\g\ \ej,  G  = 

0 

V2 

u3  ■  ■ 

0 

0 

■> 

0 

0 

0  •• 

Urij-l 

vnj-i 

\o 

0 

0  •• 

■  <S-i 

vnj  ) 

where  ej  is  a  vector  whose  ith  element  is  one  and  all  other  elements  are  zero.  We  also 
denote  the  submatrix  of  the  first  i  rows  and  i  columns  of  G  by  Gl.  Then  Lemma  3  in 
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[38]  shows  us  that 

(4.12)  ai+1  =  -||5||  ((Gi)“lei)  ,  fi+1  =  (-iyei+1\\g\\^0- 

Using  the  facts  mentioned  above,  the  following  lemma  shows  that  the  inexact 
Newton  step  generated  by  the  CG  method  satisfies  Condition  4.5. 

Lemma  4.7.  If  iteration  ( j,l )  €  T(h,k)  and  the  step  dj,i  is  generated  by  the 
conjugate  gradient  method,  then  Condition  4.5  is  satisfied  with  parameters  (4.10). 

Proof.  For  ease  of  notation,  we  temporarily  drop  the  subscripts  (j,l).  Since 
the  CG  method  is  invariant  under  an  orthogonal  transformation,  we  can  analyze  its 
behaviour  when  applied  to  the  system  of  equations  Gd  =  —g,  where  G  and  g  have 
the  form  given  in  (4.11).  Because  an  orthogonal  transformation  does  not  change  the 
eigenvalues  of  a  matrix,  Assumption  4.1  holds  for  G.  The  well-known  interlacing 
eigenvalue  theorem  for  bordered  matrices  also  implies  that  Assumption  4.1  holds  for 
all  submatrices  Gl.  From  the  relationship  between  ( gi,Si )  and  {gi,sf)  and  the  fact 

(4.12) ,  it  follows  that 

-gTsi+ 1  =  -(g)Tsi+ 1  =  \\g\\2eJ{Gl)~1e1  >  M“1||5||2. 


In  addition, 


lk+1||  =  ||«i+i||  =  ||<?||  ||(Gi)-1e1||<m-1||ff||. 


Therefore,  since  d  =  Sj+i  when  the  algorithm  exits,  we  obtain  (4.9)  with  parameters 
(4.10)  similar  to  Lemma  4.6.  □ 

The  following  lemmas  show  that  the  recursive  steps  satisfy  properties  that  enable 
us  to  prove  convergence  of  our  multigrid  method  if  the  direct  search  directions  satisfy 
Condition  4.5. 

Lemma  4.8.  Suppose  iteration  ( j,l )  €  7 Z(h,k)\T(h,k)  and  Condition  4-5  is 
satisfied  by  all  direct  search  steps.  Then  the  step  size  aj,i  is  bounded  below: 


(4.13) 


OL  j,  l  >  OLX  = 


Cl(l  -  P 2) 
MKw2  ’ 


where  K,  defined  in  Algorithm  3,  is  the  maximum  number  of  iterations  of  the  min¬ 
imization  sequence  at  level  j  —  1.  Therefore,  ctjy  >  a*  =  min{a7-,  chi}  for  any 
0)0  G  Tl(h,k). 

Proof.  From  the  inequality  (3.7),  it  follows  that 


-{djti)T  gjy  >  ip  j  ~i,o  ~  ipj—i,i*  • 

Since  the  minimization  sequence  is  monotonically  decreasing,  the  reductions  of  the 
function  value  satisfy 


i*  — 1 

Ipj- 1,0  —  Ipj- 1,»*  >  Vb-Rfc  —  1pj-l,k+l- 
k= 0 

Since  'ipj _  1  is  uniformly  convex,  it  follows  from  Lemma  4.3  that 
'fij—l.k  Ipj-l.kj-l  0.1  i,kdj— \,k  [|  • 
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Using  Lemma  4.4  and  the  fact  that  the  total  number  of  iterations  at  level  h—  1  is  less 
than  K,  we  have 


i*  —  l 


k- 0 


(4.14)  >  ci^ll  E  ^kdj-iA2  >  glK-if  >  j^KHI2, 


where  the  last  inequality  comes  from  the  fact  that  djj  is  a  prolongation  of  d*_1  and 

Hull  =  \\pjdj-i\\  <  ||.Pj|| 

Therefore,  combining  (4.8)  and  (4.14),  we  obtain 


a3,l  > 


Cl(l  -  P2) 
MYw2  ' 


which  completes  the  proof.  □ 

Lemma  4.9.  Suppose  iteration  (j,l)  £  lZ(h,k)\T(h,k)  and  Condition  f.5  is 
satisfied  by  all  direct  search  steps.  Let  p  be  the  deepest  level  in  1Z(j,l)  such  that 

(4.15)  9p.o  =  ^p+i5p+i,o  ~  *  *  *  =  Llp+i  •  •  •  Rj g:j:i ■ 

Then  for  any  iteration  (■ q ,  k )  =  (g,  0),  where  p  <  q  <  j,  and  for  iteration  (g,  k)  =  (j,  l), 
we  have 

(4.16)  COS [Oq^k)  $q—p  and  ( dq^k )  9q.k  ^  Vq  —  pII^aII  5 

where  8 q_p  =  ™ Vq-P  and  rji  =  (cCpiA2)* 

Proof.  1.  We  will  prove  (4.16)  for  (q,k)  =  (g,  0)  where  p  <  q  <  j  by  induction 
on  q.  First,  let  us  consider  iteration  ( p  +  1,  0)  which  is  computed  recursively.  From 
inequality  (3.7),  it  follows  that 

(4.17)  (i^p+1,0)  9p-\- 1,0  ^Pp. 0  ipp,i*  ^Pp, 0  V7?,i  —  QppPi  (dp7o )  9p.O: 

where  the  last  inequality  comes  from  the  Armijo  condition  (3.9a)  for  iteration  (p,  0). 
Since  (p,  0)  is  computed  directly, 

-{dpp)T gpfi  >  77r||5p,o|||- 
From  (4.15)  and  the  first  condition  in  (3.4),  we  obtain 

(4-18)  Hflp.olll  =  ||-Rp+i5p+i,o||2  >  ll^p+1,0 II 2 ■ 

Combining  all  of  these  facts  together,  we  get 

~ (^p+i,o)T<7p+i,o  >  a* Pi^VrWgp+i.oW^i 

which  proves  the  second  inequality  of  (4.16)  for  q  =  p  +  1.  From  Lemma  4.2,  we 
obtain 


II.9p+i,o II2  >  ylldp+i.oll, 


which  completes  the  proof  of  the  first  inequality  of  (4.16). 
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Now,  suppose  (4.16)  holds  for  p  <  q  <  j  —  1;  we  prove  that  (4.16)  also  holds  for 
<7+1.  Similar  to  the  case  q  =  p  +  1,  we  have 

l,o)  9q+ 1,0  ^  ^<7,0  f)q,i*  0  ^qP  —  Pl&q,o(dqfi')  9q. 0 

>  pia*  (a*piK2)9  P?7t||s<z,o||2 

>  pia*  (aViK2)9  P77tk2||59+i,oI|2 
=  ^ij+l-pllSg+l.oll  i 

since  relationship  (4.18)  also  holds  with  p  replaced  by  q.  Using  Lemma  4.2  again,  we 
obtain  (4.16). 

2.  For  iteration  (j.  I ),  inequality  (4.16)  holds  by  simply  repeating,  in  an  analogous 
fashion,  the  above  proof: 

( dj . I )  9j,l  ft  Vb*  — 1,0  —  Vb'  — 1,0  1pj—l,l  —  Pl^j—1, o(,dj—  i,o)  6b‘  — 1,0 

>  pia*  (a*piK2)]  P  1  ??r||5'j-i,o||2 

>  %-pllffilll2- 


□ 

We  can  now  prove  the  global  convergence  of  Algorithm  3. 

Theorem  4.10.  Suppose  Condition  4-5  is  satisfied  by  all  direct  search  steps. 
Then  the  iterative  sequence  {a:N,fc}  generated  by  Algorithm  3  at  the  uppermost  level 
converges  to  the  unique  minimizer  of  /n(®n)- 

Proof.  The  step  size  ttN.fc  at  the  uppermost  level  is  bounded  from  below  by  a 
constant  a*  >  0  from  Lemma  4.8.  From  the  Armijo  condition  (3.9a),  we  have 

V’N.fe  —  V’N.fe+l  >  — Q!N,fcdN,fe5N,fc- 

Therefore,  since  by  Assumption  4.1  %/)(■)  is  bounded  below,  lim^oo  dj  =  0. 
From  Lemma  4.9,  we  have 


_^N,fc5N,fc  >  cr||(/N,fc||2 

for  some  constant  a.  This  shows  that 

(4.19)  lim  ||VM*N,fc)||  =° 

k — >oo 

holds,  since  V/n(xn,A;)  =  <7N,fc  (recall  that  zjn  =  0).  The  uniqueness  of  the  minimizer 
is  from  the  strict  convexity  of  ./n^n)  in  Assumption  4.1.  □ 

We  now  prove  R-linear  convergence. 

Theorem  4.11.  Suppose  Condition  4-5  is  satisfied  by  all  direct  search  steps. 
Assume  that  the  iterative  sequence  {xn.a-}  generated  by  Algorithm  3  at  the  uppermost 
level  converges  to  the  unique  minimizer  {x^}  o/Jn^n)-  Then  the  rate  of  convergence 
is  at  least  R-linear. 

Proof.  Again  from  Condition  4.5  and  Lemma  4.9,  we  have 

(4.20)  /n(%,hi)  ~~  /n(^na)  <  ^cU?7n||V/n(xn,/c)||2- 
From  the  second  inequality  of  (4.7)  in  Lemma  4.2,  we  get 

|| V/N(^N,fc)||2  >  m(fN(xN,k)  -  /n^n))  • 
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Hence 

/nOen.m-i)  —  fa{xN,k)  <  —  a*i^m.  (/n^na)  ~  /n(*n))  > 

where  0  <  o*?7nTO  <  1  can  be  verified  straightforwardly.  By  subtracting  /n(^n)  from 
both  sides  of  the  above  inequality,  we  have: 

fa{x-N,k+i)  -  /n(zn)  <  (1  -  a*r/Nm)  (/nOl’na)  -  /n(£n))  ■ 

From  the  first  inequality  of  (4.7)  in  Lemma  4.2,  we  obtain  that 

/nO^na-)  -  /n(Zn)  >  y  ||xN,fc  -  ^nII2- 

Hence 


||£CN,fc  -  ZnII  < 
< 


(/n^na)  -  /n^’n))2 
(1  -  a*r)Nm)i (fa(xN,k-i 


) 


/n(zn))2 


< 


—  (1  -  a*r/Nm)2  (/n(zn,o) 

TO 


-  /n(®n))  2  '<> 


□ 

Corollary  4.12.  For  any  e  >  0,  after  at  most 

=  logUMzN.o)  -  /n(x^))/c) 
log(l/c) 


iterations,  where  0  <  c  =  1  —  ma2  —  <  1,  we  have  /n(£n,aO  —  /n(^n)  —  e- 

Proof.  With  the  help  of  inequality  (4.20)  and  from  the  standard  convergence 
analysis  for  convex  functions  [8],  we  have  the  result  immediately.  □ 

4.2.  Relaxing  the  Uniform  Convexity  Assumption.  In  this  subsection,  we 
prove  the  global  convergence  of  Algorithm  3  for  general  convex  functions.  Let  us 
replace  Assumption  4.1  by  the  following  assumption. 

Assumption  4.13.  , 

1.  The  level  set  V ^  =  {a;^  :  if>h{xh)  <  "fhixh, o)}  is  bounded. 

2.  The  objective  function  ifh  is  convex  and  continuously  differentiable,  and  there 
exists  a  constant  L  >  0  such  that 

(4.21)  \\Vi(>h(xh)  -  Viph(xh)\\  <  L\\xh  ~  z/J,  for  all  xh,xh  &Vh. 

3.  The  Hessian  matrix  is  bounded 

(4.22)  ||  Gh(xh)\\<M 
for  all  Xh  in  the  level  set  T>h- 

This  assumption  implies  that  there  is  a  constant  7  such  that 
(4.23)  WVifhixh)]]  <  7,  for  all  xh  G  Vh. 

Let  us  first  consider  the  direct  search  direction.  Since  the  Hessian  Gh.k  is  only  pos¬ 
itive  semi-definite,  the  direction  dh,k  generated  by  solving  the  system  of  equations 


16 


ZAIWEN  WEN  AND  DONALD  GOLDFARB 


Gh,kdh,k  =  — gh,k  may  not  satisfy  Condition  4.5.  One  strategy  is  to  add  a  small  posi¬ 
tive  multiple  of  the  identity  matrix  //,  to  Gh,k  so  that  Gh,k  =  Gh,k  +  mlh  h  rnlh  for 
some  constant  m  >  0.  Then  dh,k  can  be  computed  by  solving  the  modified  Newton 
system 

(4.24)  Gh,kdh,k  =  —Qh,k- 

As  in  the  proofs  of  Lemmas  4.6  and  4.7,  we  can  easily  show  that  the  exact  Newton 
step  and  the  inexact  Newton  step  generated  by  the  conjugate  gradient  method  satisfies 
Condition  4.5.  We  also  make  the  following  assumption: 

Assumption  4.14.  The  step  size  ahtk  is  bounded  from  above  for  any  iteration 

( h,k );  i.e.,  ah,k  <  5. 

The  following  lemma  shows  that  the  norm  of  the  search  direction  is  uniformly 
bounded  from  above. 

Lemma  4.15.  Suppose  Condition  f.5  is  satisfied  by  all  direct  search  steps.  Then 
||dy; ||  <  7  for  all  iterations  (j,l)  £  lZ(h,k). 

Proof  1.  If  iteration  (j,  l)  £  T(h,k),  we  obtain 

114*11  <  0T\\9j,i\\  <  7/4 

from  Condition  4.5  and  the  fact  (4.23). 

2.  Now  consider  iteration  (j,l)  £  7 Z(h,k)\T(h,k).  Let  p  be  the  deepest  level  in 
7 Z(j,  l)  such  that 

(4-25)  <7p,o  =  Rp+igp+i,o  =  ■  ■  ■  =  Rp+i  ■  ■  ■  Rj9j,i- 

We  prove  this  part  by  induction  on  levels.  Since  p  is  the  coarsest  level  and  the  total 
number  of  iterations  at  level  p  is  less  than  K,  we  obtain 


»*- 1  i*—l 

n<wuii  =  pw£  Oip,kdp,k)\\  <  070  ^  II 4, fell  <  tzo'K'yPt 

fc= 0  k—0 

which  proves  the  case  q  =  p  +  1.  Suppose  the  lemma  is  true  for  level  q  —  1,  we  prove 
that  it  is  also  true  for  level  q.  Similarly,  we  have 


114*11 


i*- 1  i*-l 

II  i,kdq-i,k)\\  <  wa  ^2  114-1, fcll  ^  waKy. 

fc= 0  fe= o 


For  ease  of  notation,  we  still  denote  the  right  hand  side  by  7,  which  is  finite  and 
bounded  from  above  since  there  is  only  a  finite  number  of  levels  and  the  number  of 
iterations  of  each  minimization  sequence  on  the  coarser  levels  is  at  most  K.  This 
completes  the  proof.  □ 

The  following  lemma  shows  that  the  directional  derivative  along  a  recursive  search 
direction  and  the  step  size  are  bounded  from  below  by  the  norm  of  the  gradient  raised 
to  some  power. 

Lemma  4.16.  Suppose  iteration  ( j,l )  £  R(h,k)\T(h,k)  and  Condition  f.5  is 
satisfied  by  all  direct  search  steps.  Let  p  be  the  deepest  level  in  lZ(j,  l)  such  that  (4.25) 
satisfies.  Then  for  any  iteration  ( q,k )  =  (q,  0),  where  p  <  q  <  j ,  and  for  iteration 
(q,k)  =  we  have 


(4.26) 

(4.27) 


dq^k9q,k  P. 


M1  -  Pi) 


(2i-1-l) 


(piaTriTK2(j  P)I4,*I|2)2’ 


2*—  2 


oy.fc  >  Pi 


(2i_1  — 1)  (  1  -  P2 


(2* 


7' 

(PldTVTK2<'3~P^\\gj,l\\2)2' 

72i 
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where  i  =  q  —  p. 

Proof.  1.  We  prove  this  lemma  by  induction  on  level  q.  First,  let  us  consider 
iteration  (jp  +  1,0).  From  inequality  (3.7)  and  Condition  4.5,  it  follows  that 


^p+i,o!?p+i>o  ^  fjp.o  'ipp.i  A  Pi^p,o^p:o9p.o 

>  PlUTVT\\gP,o\\2 

>  piOiT'nT^3~p)\\gj,i\\2 


which  proves  inequality  (4.26).  Based  on  the  curvature  condition  (3.9b)  and  the 
Lipschitz  continuity  of  \7iph,  the  step  size  ap+i,o  is  bounded  from  below 


ttp+1,0  >  ~  (1 


^J+i,o5p+i,o 

L\\dp+i,o\\2  ' 


From  Lemma  4.15,  we  obtain 


otP+  i,o  >  (1  —  P2) 


piarriTK 


2(j~p) 


\\m\ 


L  y2 


which  proves  ineciuality  (4.27).  Now  suppose  inequalities  (4.26)  and  (4.27)  hold  for 
p  <  q  <  j  —  1;  we  prove  that  they  also  hold  for  q  +  1.  As  in  the  case  of  q  =  p  +  1,  we 
have 


— dj+10gq+ifi  >  —  V’g,  1  >  -~p\aqfid^0gqfi 

^  (2i-1-l)  fl  -  P2  V2  }  {PlUTllTK2{j~p)\\gj,l\\2)2' 

~  PlP 1  - 

f  PiQ_ -  P2)\ (2  _1)  {piaTPTH2{3^p)\\g3A2)2' 

'  V  L  J  72*-2 

^  / PiO-  -  P2)\  (2  _1)  (piaTbTN2(j~p)||gj,;||2)2' 

\  L  J  72i+1-2 


Using  Lemma  4.15  again,  we  obtain  inequality  (4.27). 

2.  For  iteration  (j,  l),  inequalities  (4.26)  and  (4.27)  hold  by  simply  repeating,  in 
an  analogous  fashion,  the  above  proof.  □ 

Now  we  establish  the  global  convergence  of  Algorithm  3. 

Theorem  4.17.  Suppose  Condition  ^.5  is  satisfied  by  all  direct  search  steps. 
Then  in  Algorithm  3  at  the  uppermost  level 


liminffc^0O||V/N(xN,fc)||  =  0. 


Proof.  The  proof  is  by  contradiction.  Assume  that  ||giv,fc||  is  bounded  away  from 
zero;  that  is,  there  is  a  constant  e  >  0  such  that 

(4.28)  ||V/N,fc||  =  ||gN,fe||  >  e  >  0,  for  all  k  sufficiently  large. 

From  Condition  4.5  and  Lemma  4.16,  it  follows  that  each  iteration  satisfies 

(4-29)  —  aN,A^N,fcgN,fc  >  cr||gN)fc|r, 
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where  a  is  a  positive  constant  and  the  order  i  can  only  be  selected  from  a  finite  set 
of  integers  {21, 22,  •  •  •  ,  2N~N°+1},  whether  or  not  the  direction  d^,k  is  a  direct  search 
direction  or  a  recursive  search  direction.  Let  Bj  be  the  subset  of  indices  k  such  that 
inequality  (4.29)  is  satisfied  with  power  i  =  2J.  From  the  Armijo  condition  (3.9a),  we 
have 


V’N ,k  —  V’N.fc+l  >  — ttN.fcdN.feSN.fc- 

Summing  over  k  and  taking  limits,  we  obtain 

oo  N— No 

oo  >  V'N.O  -  V’N.OO  >  -  Y,  QN,fcdN.fcgN,fc  >  Y  Y  a\\9^ M]2'  • 
k— 0  j= 1  k£Bj 

Since  at  least  one  index  set  Bi  is  infinite,  we  have 

Y  ^IIsnaII2'  >  Y  ^  =  00’ 

feeBi  kGBi 

which  is  a  contradiction.  □ 

Remark  4.18.  Theorem  4-17  still  holds  if  the  bound  of  the  step  size  for  direct 
search  directions  in  Condition  4-5  is  relaxed  to  a^i  >  arllffyzll2- 

Theorem  4.17  shows  that  there  exists  a  subsequence  of  {zen,/,-}  converging  to  a 
stationary  point  zrjj  of  problem  (3.1).  However,  since  /n  is  convex  and  the  sequence 
/N(a;N,fc)  converges,  every  accumulation  point  of  {zen,*;}  is  a  global  optimal  solution 
of  problem  (3.1). 

Corollary  4.19.  Suppose  Condition  4-5  is  satisfied  by  all  direct  search  steps. 
Let  {iN,fc}  be  the  sequence  generated  by  Algorithm  3  at  the  uppermost  level.  Then  the 
whole  sequence  {V/n,/c}  converges  to  zero  and  every  accumulation  point  o/{a'N,t}  is 
a  global  optimal  solution  of  problem  (3.1). 

5.  Practical  Issues.  In  this  section,  we  discuss  some  other  components  of  the 
multigrid  method,  including  different  ways  to  generate  search  directions,  different 
strategies  for  doing  smoothing  steps  and  ways  to  define  prolongation  and  restriction 
operators.  Finally,  the  full  multigrid  method,  which  is  used  to  enhance  the  perfor¬ 
mance  of  the  multigrid  method  for  solving  PDEs,  is  extended  to  our  optimization 
context. 

5.1.  Direct  Search  Directions.  In  our  multigrid  algorithm,  the  direct  search 
direction  dhtk  can  be  computed  by  minimization  algorithms  (combined  with  a  proper 
line  search  scheme)  other  than  Newton’s  method  or  the  steepest  descent  method. 
These  methods  can  be  applied  to  compute  dh,k  without  any  difficulty  because  they 
are  totally  independent  of  any  information  from  previous  iterates  Xh,o ,  •■*  ,Xh.,k-i- 
History  dependent  methods,  such  as  nonlinear  conjugate  gradient  methods  and  quasi- 
Newton  methods,  need  to  be  carefully  tailored  here  since  dh,k-i  m&y  be  computed 
recursively.  One  simple  strategy  is  to  start  from  scratch  every  time  these  methods 
are  called  after  a  recursive  step.  But  how  to  utilize  these  incompatible  recursive  steps 
dh,i  needs  more  study. 

5.2.  Smoothing  Steps.  Traditionally,  the  use  of  smoothing  steps  is  motivated 
by  their  ability  to  smooth  errors.  Generally  speaking,  multigrid  methods  [19,  35] 
for  PDEs  will  not  converge  without  pre-  and  post-smoothing.  While  we  can  prove 
convergence  of  Algorithm  3  without  using  smoothing  steps  after  each  recursive  step, 
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we  computationally  explore  the  possibility  of  improving  the  efficiency  of  Algorithm  3 
by  incorporating  smoothing  steps  in  section  6. 

For  quadratic  functions,  we  only  require  smoothing  steps  to  satisfy  condition 
(2.10)  to  guarantee  global  convergence.  For  general  convex  functions,  we  require 
Condition  4.5  to  be  satisfied.  Adding  smoothing  steps  in  the  framework  of  Algorithm 
3  is  fairly  easy.  One  simple  strategy  is  to  do  some  smoothing  direct  search  steps  after 
a  recursive  step.  When  doing  smoothing,  we  don’t  want  to  spend  too  much  time, 
especially  for  the  expensive  problems  at  the  hirer  levels.  Several  iterations  of  the 
steepest  descent  method  or  a  conjugate  gradient  method  or  a  limited  memory  BFGS 
method  are  appropriate  in  this  case.  However,  in  our  preliminary  computational  tests, 
we  used  a  Newton’s  Method  as  a  smoother  but  only  solved  the  Newton  equation 
inexactly. 

5.3.  Prolongations  and  Restrictions.  The  mechanics  of  prolongation  and 
restriction  are  similar  to  their  counterparts  in  multigrid  methods  for  PDEs  [19,  34,  35]. 
For  example,  the  nine-point  prolongation  Ph  is  often  represented  by  the  stencil 

1/4  1/2  1/4" 

1/2  1  1/2  . 

1/4  1/2  1/4 

We  require  that  the  restriction  Rh  be  compatible  with  the  prolongation  Ph  in  the 
sense  that  (JhPh  =  Rh-  First,  define  the  discretized  version  of  the  continuous  L2  inner 
product  at  level  h 

(5-1)  (uh,vh)h=u)hUvh  ^  uh(i,j)  vh(i,j), 

where  the  scaling  factor  [19,  34]  allows  us  to  compare  the  grid  functions  on 

different  grids  and  the  corresponding  continuous  function  on  f2^.  Let  Uh  =  Ph'UH , 
then 


(' Uh,Vh)h  =  ( PhUH,Vh)h  =  ( UHlPhVh)H , 

where  P£  is  the  adjoint  of  Ph  with  respect  to  the  inner  product  (5.1).  Therefore, 
restriction  can  be  defined  as 


Rh  =  PI 

For  the  nine  point  prolongation,  we  have  Rh  =  \Ph  where  P^  is  the  transpose  of  Ph- 

5.4.  Full  Multigrid  Method.  The  basic  multigrid  method  solves  problem  (3.1) 
by  calling  £N,i*  =  MLS{ N,  Xn.OjO)-  Since  starting  from  a  good  initial  point  usually 
reduces  the  total  number  iterations  required,  the  idea  underlying  the  “full  multigrid 
method”  is  the  use  of  the  multilevel  approach  itself  to  provide  a  good  initial  point. 
Suppose  we  start  at  a  level  No  where  the  discretized  problem  is  very  easily  solved. 
We  interpolate  this  solution  to  the  next  finer  level  h  +  1  as  an  initial  approximation. 
Thereafter,  Algorithm  3  is  applied  to  the  discretized  problem  at  level  h  +  1.  This 
process  is  then  repeated  over  and  over  until  we  reach  the  uppermost  level.  The 
detailed  algorithm  is  as  follows. 

Algorithm  4.  Full  Multigrid  Method  FMLS 
Step  1.  For  h  =  Nq  <  N,  set  initial  approximation  Xh 
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Fig.  5.1.  An  illustration  of  the  full  multigrid  Algorithm  4 


1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19 
Step  2.  For  fc  =  0, 1,2,  ■  ■  •  ,N-1 

2.1.  Interpolate  Xh  to  the  next  finer  working  level  Xh+  i,o  =  PhXh- 

2.2.  Call  Xh+i  =  MLS(h+ 1,  Xh+i,o,  0)  starting  with  Xh+i,o,  i.e. ,  apply  multigrid 
Algorithm  3  to  solve  the  discretized  problem  on  level  h  +  1 

min  fh+i(xh+i) 


To  illustrate  the  full  multigrid  Algorithm  4,  we  simulate  some  steps  of  the  algo¬ 
rithm  running  on  a  problem  from  level  1  to  level  3  and  show  the  relationship  between 
the  level  and  the  iteration  history  in  Figure  5.1.  The  x-axis  denotes  the  index  of  the 
iteration  in  the  whole  minimization  procedure  whereas  the  y-axis  denotes  the  level 
at  which  the  minimization  procedure  takes  places  at  a  given  iteration.  We  denote 
a  direct  search  direction  by  — >,  a  recursive  search  direction  by  a  prolongation 
operation  by  and  a  restriction  operation  by  We  mark  each  iteration  in  each 
minimization  sequence  by  a  circle  and  also  record  its  order  in  the  corresponding  min¬ 
imization  sequence.  For  example,  there  are  two  minimization  sequences  on  level  2 
that  we  applied  to  functions  that  differ  by  only  a  linear  term.  The  first  minimization 
sequence  on  level  2  first  resorts  to  the  coarse  level  model  and  initializes  a  minimization 
sequence  on  that  level  to  compute  a  recursive  step.  This  recursive  step  direction  is 
marked  by  from  the  point  0  to  the  point  1  at  level  2.  At  the  point  1,  a  direct 
search  direction  has  to  be  called  to  compute  the  point  2  since  the  switching  condition 
(3.4)  fails.  It  is  possible  that  the  switching  condition  (3.4)  still  fails  at  point  the  2, 
then  another  direct  search  direction  has  to  be  computed.  This  procedure  continues 
until  the  convergence  condition  is  satisfied  and  the  algorithm  interpolates  the  solution 
to  level  3  as  an  initial  point. 

6.  Numerical  Tests. 

6.1.  Test  Problems  and  Discretization  Issues.  In  this  section,  we  apply  our 
multigrid  approach  to  infinite  dimensional  unconstrained  minimization  problems  of 
the  form 

(6.1)  vam.4F{u)  =  /  C(\7u,u,x)  dx 

utu  J  n 
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Table  6.1 

Variants  of  Algorithm 


Name 

Direct  Search  Direction 

Smoothing  Steps 

NEWTON 

PCG 

0 

FMLSO-PCG 

PCG 

0 

FMLS1-PCG 

PCG 

1 

FMLSO-MG 

MG 

0 

FMLS1-MG 

MG 

1 

subject  to  proper  boundary  conditions,  where  U  is  the  functional  space  wherein  u 
resides.  Such  problems  arise  in  a  wide  range  of  applications,  such  as  variational 
formulations  of  nonlinear  PDEs,  image  processing  problems,  optimal  control  problems 
and  inverse  problems. 

For  the  sake  of  simplicity,  only  the  finite  difference  case  and  only  the  two  dimen¬ 
sional  case  of  a  simple  domain  Q  =  [0, 1]  x  [0, 1]  with  Dirichlet  boundary  conditions 
are  treated.  Neumman  conditions  and  other  boundary  conditions  can  be  handled  in 
a  similar  way  by  first  eliminating  the  boundary  nodes.  We  discretize  fi  at  level  li  as 
a  square  grid 

=  {( i,j )  d=  {xuVj)  I  Xi  =  iul,yj  =juvh,i  =  0, 1,  •  -  -  ,n%;j  =  0, 1,  •  ■  •  ,nvh}, 

where  the  mesh  size  u>^  =  1/n^  and  u>\  =  1  /nvh  and  we  take  =  nvh  =  2h  for  the 

sake  of  simplicity.  Then  the  objective  functional  is  discretized  as 

(6.2)  F(u)  =  'y  '  y  1  ui,j  >  ui,j  "b  £(5X  ui,j  >  ui,j  ■ 

i— 0  j— 0 

where  5xUij,5yUij  and  S~Uij,5~Uij  are,  respectively,  the  forward  and  backward 
finite  differences  with  respect  to  x  and  y. 

6.2.  Performance  of  the  Multigrid  Methods.  We  mainly  focus  on  the  per¬ 
formance  of  the  full  multigrid  Algorithm  4  with  zero  or  one  smoothing  step.  When 
we  specify  that  a  particular  version  of  Algorithm  3  does  k  smoothing  steps,  we  mean 
that  before  considering  doing  a  recursive  step,  the  algorithm  first  takes  k  direct  search 
steps.  It  may  take  additional  direct  search  steps  if  the  test  for  doing  a  recursive  step 
is  not  met.  The  direct  search  directions  on  the  coarsest  level  are  obtained  by  fac¬ 
torizing  the  Hessian.  The  direct  search  directions  on  other  levels  are  computed  by 
a  preconditioned  conjugate  gradient  method  (PCG)  using  an  incomplete  Cholesky 
factorization  pre-conditioner.  We  denote  the  version  of  the  method  that  does  not  use 
smoothing  steps  by  “FMLSO-PCG”  and  the  version  that  uses  one  smoothing  step  by 
“FMLS1-PCG” .  If  the  direct  search  directions  on  all  but  the  coarsest  level  are  com¬ 
puted  by  the  multigrid  Algorithm  2  (MG)  for  solving  the  linear  equations,  we  denote 
these  methods  by  “FMLSO-MG”  and  “FMLS1-MG”,  respectively.  (See  Table  6.1  for 
a  summary  of  these  methods).  We  construct  the  matrices  A ^  for  the  various  levels  so 
that  Assumption  2.2  is  satisfied  for  the  linear  multigrid  method.  We  also  give  results 
obtained  using  Newton’s  Method  with  PCG  on  the  finest  level  for  a  comparison. 

In  our  test  problems  the  grid  spacing  is  set  to  2~3  at  the  coarsest  level  and  to  2-8 
at  the  finest  level.  This  gives  a  257  x  257  grid  on  the  finest  level.  How  we  computed  the 
gradient  and  the  Hessian  is  explained  in  the  Appendix.  The  initial  point  in  Algorithm 
4  is  taken  to  be  the  zero  vector.  For  the  multigrid  Algorithm  3,  we  set 

k  =  1CT4,  eh  =  10”4,  K  =  20,  pi  =  0.01,  p2  =  0.2. 
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Table  6.2 

Summary  of  computational  costs  for  Problem  6.3 


FMLSO-PCG 

FMLS1-PCG 

nh 

9* 

17* 

33* 

65* 

129* 

257* 

9* 

17* 

33* 

65* 

129* 

257* 

nls 

2 

4 

2 

1 

1 

1 

0 

1 

1 

1 

1 

1 

nge 

4 

8 

4 

2 

2 

2 

0 

2 

2 

2 

2 

2 

nhe 

2+ 

4 

2 

4 

5 

6 

0 

2 

3 

4 

5 

6 

IIsnIIz 

6.197382e-07 

6.196731e-07 

CPU 

6.787156 

6.736728 

FMLSO-MG 

FMLS1-MG 

nh 

9* 

17* 

33* 

65* 

129* 

257* 

9* 

17* 

33* 

65* 

129* 

257* 

nls 

3 

6 

3 

1 

1 

1 

1 

3 

2 

1 

1 

1 

nge 

6 

11 

5 

2 

2 

2 

2 

5 

3 

2 

2 

2 

nvc 

3+ 

6 

4 

2 

2 

2 

1+ 

4 

4 

2 

2 

2 

IIsnIIz 

1.028328e-05 

1.028328e-05 

CPU 

3.565317 

3.328619 

Newton  | 

nh 

nls 

nge 

nhe 

IlffNlb 

CPU 

257* 

i 

2 

20 

1.074113e-06 

8.10042 

Note  that  these  parameters  could  also  be  set  adaptively  for  each  level. 

The  algorithm  described  above  has  been  implemented  in  MATALAB  (Release 
7.3.0);  the  line  search  method  is  adapted  from  the  code  “DCSRCH”  [28]  with  an 
initial  step  size  of  one.  All  experiments  were  run  on  a  Dell  Precision  670  workstation 
with  an  Intel  xeon(TM)  3.4GHZ  CPU  and  6GB  of  RAM. 

6.2.1.  Nonlinear  PDE  1.  We  study  the  nonlinear  PDE  [2] 

,  —A  u  —  it2  =  fix )  in  H, 

(6-3)  n 

u  =  0  on  oil, 


where  fix)  =  x 6  and  D  =  [0, 1]  x  [0, 1].  The  corresponding  variational  problem  is 

r  i  i 

min F(u)  =  /  -|Vw|2  —  Tu3  —  f  u  dx. 

Jn  2  3 

The  stopping  tolerance  for  PCG  and  MG  is  1(D3. 

In  Tables  6. 2-6. 5,  we  summarize  the  computational  costs  of  the  various  methods 
on  this  and  the  other  problems.  Specifically,  rih  denotes  the  number  of  variables  on 
level  “h”;  “nls”,  and  “nge”  denote  the  total  number  of  line  searches  and  the  total 
number  of  gradient  evaluations  at  that  level,  respectively;  “nhe”  denotes  the  total 
number  of  Hessian  evaluations  on  the  coarsest  level  (marked  with  f)  and  the  total 
number  of  matrix-vector  products  on  other  levels  if  a  PCG  is  used;  for  all  levels  other 
than  the  coarsest,  “nvc”  denotes  the  total  number  of  multigrid  cycles  on  that  level; 
on  the  coarsest  level,  “nvc”  means  the  total  number  of  Hessian  evaluations  (marked 
with  f).  Since  the  total  number  of  objective  function  evaluations  is  equal  to  “nge”  in 
our  implementation,  we  do  not  include  it  in  the  table.  Both  PCG  methods  work  well 
for  this  problem  because  on  the  finer  levels,  only  one  Newton’s  step  is  needed. 

From  Table  6.2,  we  can  see  that  the  counts  “nls”,  “nge”  and  “nhe”  at  the  coarse 
levels  for  method  “FMLSO-PCG”  are  larger  than  those  at  the  finer  levels.  This  shows 
that  most  of  the  function  value,  gradient  and  Hessian  evaluations  occur  mainly  on 
the  coarser  levels.  Table  6.2  also  gives  the  total  CPU  time  measured  in  seconds  and 
the  accuracy  attained,  which  is  measured  by  the  2-norm  ||<7nI|2  of  the  gradient  at 
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the  final  iteration.  Both  PCG  methods  achieve  very  good  accuracy.  It  is  interesting 
to  note  that  the  difference  between  CPU  times  is  very  small.  This  confirmed  the 
property  of  multilevel  methods  that  their  cost  does  not  increase  very  much  if  there 
are  more  function,  gradient  and  Hessian  evaluations  at  the  coarser  levels.  We  can 
observe  similar  behavior  for  method  “FMLSO-MG”  and  method  “FMLS1-MG”.  All 
of  these  methods  only  take  one  Newton  step  at  the  finest  level. 

Table  6.2  shows  that  different  inexact  solvers  for  the  direct  search  direction  com¬ 
putation  lead  to  quite  different  results.  MG  performs  better  than  PCG  in  this  example 
in  terms  of  CPU  time.  Although  they  are  not  that  different  in  terms  of  “nls”  and 
“nge” ,  they  are  different  in  terms  of  the  operations  involving  the  Hessian  (however, 
it  is  hard  to  compare  the  two  quantities  “nhe”  and  “nvc”  directly). 

To  illustrate  the  multilevel  behavior  of  methods  “FMLSO-PCG”  and  “FMLS1- 
PCG”  we  plot  the  level  versus  iteration  history  for  them  in  Figure  6.1,  which  is 
a  simplified  version  of  Figure  5.1.  We  can  see  that  a  recursive  step  is  not  always 
performed;  it  is  only  performed  if  it  is  necessary.  Although  “nls”  indicates  that  there 
are  iterations  on  the  coarsest  level,  Figure  6.1  shows  that  the  multigrid  method  never 
returns  to  the  coarsest  level  once  the  minimization  procedure  is  running  on  the  finer 
levels. 

6.2.2.  Nonlinear  PDE  2.  We  study  the  nonlinear  PDE  [20]: 

/  —Au  +  A  ueu  =  f  in  H, 

(6-4)  n  no 

u  =  0  on  aS2, 

where 

/  =  (9tt2  +  Ae«x2-x3)sin(3,r^)(a;2  -  x3)  +  6x  -  2)  sin(37n/), 

A  =  10,  =  [0,1]  x  [0,1]  and  the  exact  solution  is  u  =  ( x 2  —  x3)  sin(37ry).  The 

corresponding  variational  problem  is 

miniF(u)  =  f  i|Vu|2  —  A [ueu  —  e“)  dx. 

Jn  2 
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Table  6.3 

Summary  of  computational  costs  for  Problem  6.4 


FMLSO-PCG 

FMLS1-PCG 

92 

17  2 

332 

652 

129^ 

257^ 

92 

17  2 

332 

652 

129^ 

257^ 

nls 

5 

6 

3 

2 

1 

1 

1 

3 

2 

1 

1 

1 

nge 

9 

12 

6 

4 

2 

2 

2 

5 

3 

2 

2 

2 

nhe 

5+ 

4 

5 

2 

3 

3 

1+ 

4 

4 

2 

3 

3 

IIsnIIs 

1.303290e-05 

1.362962e-05 

CPU 

6.211418 

6.172631 

FMLSO-MG 

FMLS1-MG 

Tlh 

92 

17  2 

33^ 

65^ 

129^ 

257^ 

92 

17  2 

33^ 

652 

129^ 

257^ 

nls 

5 

8 

5 

3 

1 

1 

1 

5 

3 

2 

1 

1 

nge 

9 

15 

9 

5 

2 

2 

2 

8 

5 

3 

2 

2 

nvc 

5t 

9 

6 

4 

2 

2 

1+ 

9 

4 

4 

2 

2 

IIsnIIz 

2.898190e-05 

2.898191e-05 

CPU 

3.091660 

2.855589 

Newton 

nh 

nls 

nge 

nhe 

IIsnI|2 

CPU 

2572 

2 

3 

32 

3.133664e-07 

14.39141 

Fig.  6.2.  Iteration  history  for  Problem  6. 4-  Left:  “FMLSO-PCG”;  Right:  “FMLS1-PCG” 


The  stopping  tolerance  for  PCG  and  MG  is  10-3.  From  Table  6.3,  both  “FMLSO- 
PCG”  and  “FMLS1-PCG”  work  well  because  both  methods  only  need  one  Newton 
step  at  the  finest  level.  But  “FMLSO-PCG”  spends  more  time  at  the  coarse  level 
especially  at  the  beginning  looking  for  a  good  initial  point.  Both  methods  achieve 
very  good  accuracy  and  the  difference  between  CPU  times  is  also  very  small.  From 
Figure  6.2,  we  see  that  the  multigrid  method  also  never  returns  to  the  coarsest  level 
once  the  minimization  procedure  is  running  on  levels  close  to  the  finest  level.  We  can 
observe  similar  behaviors  for  method  “FMLSO-MG”  and  method  “FMLS1-MG”.  MG 
performs  better  than  PCG  in  this  example  in  terms  of  CPU  time. 

6.2.3.  Minimal  Surface  Problem.  Consider  the  minimal  surface  problem 
min  f(u)  =  I  y/l  +  || Vu(cc)||2  dx 

Jn 

u(x)  €  K  =  {u  €  Lf1(G)  :  u(x)  =  uq(x)  for  x  €  <9G}, 


(6.5) 


s.t. 
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Table  6.4 

Summary  of  computation  costs  for  Problem  6.5  with  boundary  u]q 


FMLS0-PCG 

FMLS1-PCG  ] 

nh 

92 

172 

332 

652 

1292 

2572 

92 

172 

332 

652 

1292 

2572 

nls 

8 

11 

3 

2 

1 

1 

2 

5 

1 

1 

1 

1 

nge 

15 

23 

6 

4 

2 

2 

4 

10 

2 

2 

2 

2 

nhe 

8+ 

9 

5 

4 

4 

3 

2+ 

7 

3 

3 

4 

3 

IIsnIU 

2.261061e-06 

1.999010e-06 

CPU 

7.068329 

6.931453 

FMLSO-MG 

FMLS1-MG 

n-h 

92 

172 

332 

652 

1292 

2572 

92 

172 

332 

652 

1292 

2572 

Ills 

8 

13 

5 

3 

1 

1 

2 

6 

3 

2 

1 

1 

nge 

15 

26 

9 

5 

2 

2 

4 

12 

5 

3 

2 

2 

nvc 

8+ 

18 

6 

4 

2 

2 

2+ 

17 

4 

4 

2 

2 

IIsnI|2 

1.811500e-05 

1.811409e-05 

CPU 

3.982122 

3.809665 

Newton 

nh 

nls 

nge 

nhe 

IIsnIIz 

CPU 

2572 

5 

16 

73 

2.136139e-06 

38.81010 

where  Q  =  [0, 1]  x  [0, 1].  We  will  test  two  sets  of  boundary  data  [17,  29]: 


(Surfl):  u^x) 

(Surf2):  u2n(x) 


x(l-x),  y  =  0,1, 

0,  otherwise, 


— sin(2Try) , 
sin{2Tty), 
sin(  2nx), 
—sin(2.Ttx), 


x  =  0, 
x  =  1, 

y  =  o, 

y  =  l. 


The  stopping  tolerance  for  PCG  and  MG  is  10-3  for  boundary  condition  Uq(x)  while 
it  is  10”1 1| II  for  boundary  condition  Uq(x). 

From  Table  6.4,  both  “FMLS0-PCG”  and  “FMLS1-PCG”  work  well  for  the 
boundary  condition  Uq(x).  But  “FMLS0-PCG”  spends  more  time  at  the  coarser 
levels,  especially  at  the  beginning  looking  for  a  good  initial  point.  Both  methods 
achieve  very  good  accuracy  and  the  difference  between  CPU  times  is  also  very  small. 
From  Figure  6.3,  we  see  that  the  multigrid  method  also  never  returns  to  the  coarsest 
level  once  the  minimization  procedure  is  running  on  levels  close  to  the  finest  level.  We 
can  observe  similar  behaviors  for  method  “FMLS0-MG”  and  method  “FMLS1-MG”. 
MG  performs  better  than  PCG  in  this  example  in  terms  of  CPU  time. 

From  Table  6.5,  we  see  that  “FMLS0-PCG”  and  “FMLS1-PCG”  also  work  well, 
but  they  require  recursive  steps  quite  often  for  the  boundary  condition  u^x).  The 
difference  between  the  achieved  accuracy  is  small  but  the  difference  between  CPU 
time  is  not.  They  behave  like  two  level  methods  as  we  can  see  in  Figure  6.4.  This 
fact  is  of  great  advantage  because  the  bounds  estimated  in  the  convergence  analysis 
are  much  smaller  in  this  case.  Therefore,  we  can  still  anticipate  a  fast  convergence 
rate.  However,  “FMLS0-MG”  and  “FMLS1-MG”  do  not  work  well  in  this  case.  One 
reason  perhaps  is  that  the  linear  multigrid  method  we  implemented  can  not  handle 
the  nonlinearity  well  in  this  case. 

7.  Discussion.  The  multigrid  Algorithm  3  provides  an  automatic  way  to  alter¬ 
nate  between  recursive  steps  and  direct  search  steps.  Usually,  different  inexact  solvers 
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Table  6.5 

Summary  of  computation  costs  for  Problem  6.5  with  boundary  Uq 


FMLSO-PCG 

FMLS1-PCG 

nh 

92 

172 

332 

652 

1292 

2572 

92 

172 

332 

652 

1292 

2572 

nls 

16 

32 

23 

16 

12 

7 

3 

9 

6 

5 

4 

3 

nge 

31 

62 

46 

33 

25 

17 

5 

15 

12 

8 

7 

7 

nhe 

161 

47 

48 

54 

98 

26 

3+ 

20 

13 

23 

31 

24 

IIsnII2 

1.081637e-05 

3.840665e-05 

CPU 

29.53286 

20.78162 

FMLSO-MG 

FMLS1-MG 

nh 

92 

172 

332 

652 

1292 
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Fig.  6.3.  Iteration  history  for  Problem  6.5  with  boundary  .  Left:  “FMLSO-PCG”;  Right: 
“FMLS1-PCG” 


for  the  direct  step  leads  to  quite  different  results.  Our  algorithm  can  be  viewed  as 
a  combination  of  the  global  linearization  method  and  the  FAS  scheme  if  the  direct 
search  step  is  solved  by  the  linear  multigrid  method.  Applying  the  multigrid  frame¬ 
work  from  the  point  view  of  optimization  provides  us  with  new  opportunities  for 
designing  efficient  algorithms. 

Appendix.  Computing  Derivatives  Efficiently  .  For  the  sake  of  simplicity, 
we  only  consider  a  discretization  of  the  objective  functional  (6.1)  using  forward  finite 
differences 


n^  —  ln^  —  l  nh~  ^nh~ 1 

(A.l)  F{u)=uluvh  J2  J2  J2  Ltp 


2=0  j=  0 


2  =  0  j=  0 
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Fig.  6.4.  Iteration  history  for  Problem  6.5  with  boundary  u^.  Left:  “FMLSO-PCG”;  Right: 
“FMLS1-PCG” 


Iteration  History 


Iteration 


where  Lf-  =  L(Sfuij,  S+uL  Then,  the  gradient  of  F(u)  with  respect  to  interior 

points  are 

VF(1  :  <  -  1, 1  :  nl  -  1) 

,A2)  =  {9i(0-nxh-2,l:nvh-l)-g+{l:nl-l,l:nvh-l)) 

+  (dyO-  =  n%  -  1,0  :  nyh  -  2)  -£+(1  :  n%  -  1,1  :  nyh  -  1)) 

+  gu{l  1), 


where 


n-\- _ ,  ,y 

9x  ^ h 


dL(8~fu,  1 5yU,  u ) 
d(5+u)  : 


9y  =  “h 


dL(Sfu,  8y  u,  u ) 
9(Sy  U) 


>  9u  =  uluvh 


dL(5ffu,  5y  u,  u ) 
du 


Q2  L+  Q2  Lj+  L+ 

Similarly,  we  collect  partial  derivatives  ‘A,  ,  tvtt - - t,  n/f+  *|J,_  and 

d(SjUij)2  dlSZui'jidldyUi'j)1  d(SJuij)2 

d2L+.  . 

du2',J  into  the  matrices 


Q+ 


uvh  d2L(8+u,5+u,u) 
uxh  d(6+u )2 


ri+ 

^  xy 


d2L(5+u,SyU,u ) 
d(5xu)d(5yu) 


+  d2L(S+u,S+u,u)  d2L(6+u,5+u,u ) 

G»  =  4  awy*  ’  c““  =  — 


Then,  by  using  the  notation  in  the  Matlab  language  and  using  the  function  “spdiags” 
which  creates  sparse  matrix,  the  lower  triangular  part  G  of  the  Hessian  'S72F(u)  for 
the  interior  points  can  be  computed  as  follows. 


Algorithm  5.  Compute  the  lower  triangular  part  of  Hessian  \72F 
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Gi  =  (G+  (0  :  n%  -  2, 1  :  nyh  -  1)  +  G+  (1  :n*h-  1,1:  <  -  1)) 

+(G+ (1  :  <  -  1,  0  :  <  -  2)  +  G+ (1  :  <  -  1, 1  :  n\  -  1)) 

+2Gjy(l  :  —  1, 1  :  nvh  —  1)  +  Guu(  1  :  —  1, 1  :  nvh  —  1); 

G2  =  -G+  (1  :  <  —  1, 1  :  <  —  1)  —  G+ (1  :  <  -  1, 1  :  nyh  -  1); 

G3  =  G+(1:<-1,1:<-1); 

G4  =  -G+ (1  :  <  —  1,1  :  <  —  1)  —  G+ (1  :  <  -  1, 1  :  nyh  -  1); 

Gi  =  Gi  (:);  G2«-1,:)=0;  G2  =  G2(:); 

G4  =  G4(:);  G3«-1,:)=0;  G3  =  G3(:);  G3  =  [0;  G3(l  :  end  -  1)] 

G  =  spdiags([G4,  G3,  G2,  G4],  [-«  -  2),  -  3),  -1,  0],  «  -  2)2,  «  -  2)2); 


Matrix-vector  products  involving  the  Hessian  can  be  computed  as 

[\72F{u)]v  =  ^l  Y.  Y.V2LUv- 

i=0  j— 0 

Remark  A.l.  For  many  PDE  constrained  problems,  one  can  eliminate  the  state 
variable  and  transform  the  problem  into  an  unconstrained  one.  The  gradient  and 
matrix-vector  products  involving  the  Hessian  can  then  be  computed  by  the  adjoint 
method.  When  the  objective  function  is  in  the  form  of  least  squares  or  it  involves 
a  Total  Variation  regularization  term,  one  can  also  approximate  the  Hessian  by  a 
Gauss-Newton  approximation. 
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